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SUMMARY 


A  mathematical  model  is  described  which  forms  the  basis  of 
a  spectral  enhancement  program  that  follows  along  the  lines 
of  the  MAZE  series  of  spectral  unfolding  codes.  The  treatment 
of  a  priori  and  a  posteriori  information  functions  are  elaborated 
as  well  as  their  optimization  and  subsequent  implementation. 

The  code  was  developed  for  a  serial  computer  and  the  results 
are  demonstrated  for  several  examples.  The  code  has  also 
been  converted  to  the  Illiac  IV  parallel  processor  computer 
system  in  the  Glypnir  language. 
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1.  INTRODUCTION 


The  MAZE  series  of  computer  programs  are  general  purpose 

estimation  codes  that  are  and  have  been  used  for  unfolding,  resolution 

enhancement,  smoothing,  interpolation  and  fitting.  MAZEl^  and 

(2) 

MAZE2  have  been  sponsored  by  the  Defense  Nuclear  Agency.  This 
report  describes  the  development  of  an  improved  version  of  MAZE2., 
now  designated  MAZE3  and  the  subsequent  conversion  of  this  improved 
version  of  Illiac  IV  (14)  which  is  designated  MAZE4.  It  should  be 
emphasized  that  this  effort  was  not  strictly  a  conversion  of  an 
existing  computer  program  from  a  serial  computer  to  the  14  computer. 
Rather,  there  was  a  great  deal  of  emphasis  placed  on  introducing  the 
latest  information  theory  techniques  into  MAZE3  and  debugging  a 
code  that  would  work  very  well  for  serial  computers.  Only  after 
this  was  accomplished  was  the  code  then  fitted  to  the  14  computer  to 
take  advantage  of  the  unique  computational  advantages  provided  by 
that  system. 

The  14  is  a  parallel  computer  system  basically  consisting  of  a 
Control  Unit  (CU)  and  64  Processing  Elements  (PE's).  The  PE's 
work  in  parallel  and  can  execute  64  instructions  simultaneously. 

The  CU  can  also  be  executing  instructions  at  the  same  time.  The 
obvious  advantage  of  the  14  over  serial  computers  is  the  simultaneous 
execution  of  up  to  64  instructions;  however,  they  have  to  be  the  same 
instruction.  In  each  PE  the  instructions  operate  on  data  elements 
within  the  individual  PE's  own  memory. 


Surrounding  the  14  system  is  a  great  deal  of  peripheral  storage 
which  will  be  useful  when  MAZE  is  being  used  for  unfolding  a 
spectrum  with  a  large  number  of  channels.  It  is  possible  (although 
perhaps  not  very  practical,  as  will  be  described  later)  to  parameterize 
the  response  data  from  Ge(Li)  detectors  in  order  to  minimize  the 
size  of  the  response  matrix.  However,  the  manner  in  which  MAZE 
is  expected  to  be  used  on  14  is  that  the  full  response  matrix  will  be 
necessary.  Thus,  the  use  of  the  peripheral  storage  (14  disk)  is 
imperative.  Also,  since  the  computation  time  goes  up  as  the  square 
of  the  increased  number  of  channels,  it  is  evident  that  the  14  system 
can  be  an  effective  tool  for  unfolding. 

Section  2  describes  the  principles  of  unfolding  which  form  the 
basis  of  the  mathematical  model.  Section  3  elaborates  upon  a  specific 
form  of  the  optimization  function  and  Section  4  explains  the  method 
of  implementation.  Section  5  has  some  comments  on  the  parameter¬ 
ization  of  the  Ge(Li)  response  function,  while  Section  6  contains 
some  specific  examples  of  unfolding. 
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2.  PRINCIPLES  OF  UNFOLDING 


The  underlying  mathematical  principles  of  unfolding  are  well 
(1  2) 

covered  elsewhere,  ’  nonetheless  parts  of  it  will  be  reproduced 
here  for  completeness. 


The  system  consisting  of  source,  spectrometer,  unfolding 
processor,  and  receiver  involves  the  following  flow  of  information: 


The  object  spectrum  o  (E),  is  the  energy  distribution  of  neutrons  or 
gammas  from  the  source.  It  is  viewed  by  the  spectrometer,  which 
produces  a  data  vector,  T  has  passed  through  a  complex  detection 
process  and  suffers  from  Iocs  of  resolution,  from  parti  :le  recoil 
effects,  and  from  noise  due  to  statistical  fluctuations  of  he  discrete 
count  data.  It  is  the  job  of  the  unfolding  processor  to  minimize  the 
effects  of  the  spectrometer  response  and  the  count  fluctuations,  and 
to  construct  from  the  data  an  image  spectrum,  o(E),  that  corresponds 
closely  to  the  true  object,  transmitting  this  image  spectrum  to  the 
receiver. 

One  way  of  viewing  the  unfolding  processor  is  in  terms  of  its 
objective.  How  closely  the  image  function,  o,  corresponds  to  the 
object  function,  o,  is  a  measure  of  the  effectiveness  of  the  unfolding 


processor.  Generally  there  is  a  distance  function,  3(0,0),  which  tells 
how  far  a  given  is  separated  from  a  given  <P .  Given  the  true  object 
the  optimum  image,  V3,  is  the  one  for  which  s(o,o)  is  as  small  as 
possible.  This  is  of  conceptual  interest  but  of  no  practical  utility, 
because  if  o  were  known  there  would  be  no  need  for  a  measurement. 
However,  one  can  construct  a  function  W  which  measures  the 
effectiveness  of  the  unfolding  processor,  but  which  is  a  function  only 
of  quantities  that  are  input  to  or  output  from  the  processor,  that  is 

W  =  W(»,<P)  . 

Then,  given  the  data  set,  ®,  the  optimum  image  would  be  the  one  for 
which  W(T,<P)  is  as  small  as  possiVe. 

Let  us  begin  with  a  more  precise  statement  of  the  optimization 
principle:  There  is  a  function  of  the  data  T1  and  ihe  image  estimate  0 , 

W(T,<P)  , 

such  that  for  a  given  data  set  the  optimum  image  under  the  W-criterion 
occurs  at  that  0  which  minimizes  W.  The  principle  does  not  say 
that  the  resulting  is  universally  optimum,  only  optimum  under 
the  criterion  defined  by  W.  It  remains  for  the  inventors  of 
W-functions  to  insure  that  they  have  a  precise  understanding  of  what 
constitutes  an  optimum  image  and  that  they  are  accurately  translating 
that  understanding  into  mathematical  forms. 

The  next  important  point  is  that  W  can  be  thought  of  as  an 
information  function.  Information  functions  are  related  to  probability 
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distributions  in  a  simple,  precisely  defined  way,  and  we  now  explain 
that  relationship. 

Information  has  to  do  with  reduction  of  the  unknown,  that  is  with 
the  specification  of  quantities  that  had  been  random  relative  to 
previously  acquired  knowledge.  The  shape  and  magnitude  of  an 
energy  spectrum  are  such  quantities.  Before  actual  measurement 
no  definite  statement  can  be  made.  At  best  a  probability  density 
can  be  assigned  to  a  given  shape  and  magnitude,  based  upon  known 
characteristics  of  the  source  and  known  distributions  among  popula¬ 
tions  of  sources.  Certain  classes  of  conceivable  configurations  may 
be  ruled  out  as  occurring  with  vanishingly  small  probability,  and 
this  fact  will  be  used  in  the  unfolding  process;  but  the  idea  that  a 
given  spectrum  is,  even  before  measurement,  of  definite  shape  and 
magnitude  is  of  no  practical  utility  when  these  parameters  are 
unknown. 

If  a  measurement  could  determine  these  parameters,  an  amount 
of  information  would  be  acquired.  The  amount  would  depend  upon 
the  rarity  of  the  spectrum  that  was  observed.  That  observation  of  a 
common  spectral  shape  conveys  a  smaller  amount  of  information 
than  observation  of  a  rare  spectral  shape  agrees  with  the  intuitive 
notion  of  information,  as  does  the  idea  that  the  observation  of  two 
spectra  with  similar  shapes  conveys  twice  as  much  information  as 
observation  of  one.  Suppose  the  probability  of  that  shape  is  p;  then 
the  probability  of  n  spectra  with  that  shape  is  P  =  pn.  Consequently, 
the  desired  additivity  of  information  is  guaranteed  if  it  has  the  form 


W  =  k(nP 


where  k  is  a  constant  and  In  stands  for  the  natural  logarithm,  since 
it  is  a  property  of  logarithms  that 

t-n  pn  =  n  In  P 

There  is  no  particular  significance  to  the  value  of  k  or  to  the  choice 
of  logarithmic  base  other  than  to  define  units  of  information.  It  is 
customary  to  take 

W  =  -  In  P  . 

The  minus  sign  is  necessary  because  information  increases  as 
probability  decreases. 

Now,  a  generalization  of  the  simple  probability  p,  is  the  proba¬ 
bility  distribution  f  (<&).  If  we  imagine  a  space  of  all  possible  spectra 
o  such  that  each  ^  is  a  point  in  the  space,  and  if  dQ  is  an  infinitesi¬ 
mal  volume  surrounding  such  a  point,  then 

dp  =  f(<p)  dP* 

is  the  infinitesimal  probability  of  finding  5  in  dfl.  That  is  a  definition. 
We  can  turn  it  around,  write 


and  think  of  f (<P)  as  the  density  of  probability  at  *P .  The  corresponding 
generalization  of  information  is  the  information  function 
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W  =  -  In  f 


As  before,  information  is  additive  when  probability  is  multiplicative. 
Notice  a  very  important  property  of  the  information  function  that 
the  point  of  minimum  information  corresponds  to  the  point  of 
maximum  probability. 

The  information  function  required  for  unfolding  is  complex 
conceptually,  because  it  involves  two  random  states,  the  data  and 
the  image.  Let  us  next  consider  this  complication,  first  from  the 
viewpoint  of  probabilistic  arguments  that  motivate  it,  and  secondly 
from  the  viewpoint  of  intuitive  notions  of  why  it  works.  The  coordinate 
(E)  and  the  analogous  subscript  (i)  are  unimportant  to  this  discussion, 
and  will  be  dropped.  What  is  most  important,  is  the  fact  that 
object,  data,  and  image  are  random  variables. 

We  have  already  considered  the  fact  that  before  measurement 
the  object  function  is  random  relative  to  available  knowledge.  We 
called  its  probability  distribution,  f(<P).  After  measurement  ® 
is  not  reduced  to  certainty.  It  is  still  a  random  variable,  but  its 
probability  distribution  f ( <pj  )  is  reduced  in  size.  We  have  not 
discussed  the  size  of  probability  distributions  and  will  not,  but  the 
precise  notion  defined  in  terms  of  entropy  corresponds  to  the 
intuitive  notion  that  a  probability  distribution  is  smaller  or  tighter 
when  the  range  of  probable  variation  is  decreased.  In  the  distribution 
f(£|yi),  known  as  the  "conditional  probability  of  <P-if-$",  ®  is 
thought  of  as  a  random  variable,  and  T  is  thought  of  as  fixed. 
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Of  course  it  is  also  possible  to  think  of  the  data  as  random. 

If  a  measurement  of  a  known  state  ®  is  made,  say  of  a  test  source, 
the  data  that  would  result  from  an  ensemble  of  separate  measure¬ 
ments  with  the  gamma  camera  would  be  distributed  probabilistically 
with  some  distribution  f(T>|<5),  primarily  due  to  the  discrete  nature  of 
the  gamma  flux.  This  time  V  is  thought  of  as  the  random  variable 
and  <P  is  thought  of  as  fixed,  and  the  resulting  probability  distribution 
f(?\o)  is  called  the  conditional  probability  of  X>-if-co.  It  is  important 
to  realize  that  f(yjo)  contains  a  mathematical  model  of  the  camera 
response  as  well  as  of  the  fluctuation  process,  since  both  are 
necessary  for  defining  the  distribution  of  T>.  It  is  also  worth  re¬ 
marking  that  so  far  the  symbol  f  has  been  used  to  specify  three 
di  ferent  distributions,  f(<£),  ^),  and  f (3  [to),  with  the  hope 

that  the  different  argument  structures  will  be  sufficient  to  distinguish 
them.  The  freest  possible  variation  of  ?  occurs  when  <0  is 
unspecified.  The  associated  distribution,  which  of  course  is  larger 
than  f(3)|o),  we  call  f(S'). 

One  more  distribution  involving  and  T:  if  nothing  is  known, 
neither  the  state  of  the  source  nor  the  state  of  the  data,  then  both 
must  be  treated  as  random.  A  product  space  may  be  formed  such 
that  each  point  in  space  corresponds  to  a  pair  (o,T).  The  associated 
probability  density  f(<P,  V)  is  known  as  the  joint  probability  of  <P 
and  T>. 

Now  this  distribution,  the  joint  distribution  of  o  and  T,  is  a 
complete  specification  of  the  random  properties  of  ®  and  3),  and 
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therefore  determines  the  four  previous  distributions.  The  distribu¬ 
tion  of  <P  is  just  the  joint  distribution  of  <?>  and  T>  with  integration 
over  all  possible  3): 

f(o)  =  f(o,T)  dO  . 

all  T  ® 

The  distribution  of  3)  is  the  joint  distribution  of  5  and  3)  with 
integration  over  all  possible  <P : 

f(T)  =  J...  J  f(o,£)  . 

<P 

The  relationship  with  the  conditional  probability  of  <P-if-3)  is 
f(o,T)  =  f(5  |  3))  f(S?)  , 

because  the  probability  of  S'  being  an  infinitesimal  volume  dfy.,  about 
3)  is 

m  , 

while  the  probability  of  ®  being  simultaneously  in  d£2  about  is 

V 

f(o  |  3))  dfi_  , 
o 

the  compound  probability  being 

f(<P  ,  35)  dQ_  dQ?  =  f(  o  |  3?)  dft_  f(T)  dn  . 
o  tp 


15 


Bv  the  same  token,  the  relationship  with  the  conditional  probability 
of  ®-if-P  is 

f(<P  ,®)  =  f(®  |  <P  )  f(<P  )  . 

Finally,  the  fact  that  the  two  conditional  probabilities  are  related  to 
the  joint  probability  means  that  they  are  themselves  related  by  the 
very  important  equation 

f(P  |®)  f(®)  =  f(®|  p)  f(p)  . 

This  implies  that  any  one  cf  the  conditional  or  single  probabilities 
can  be  expressed  in  terms  of  the  other  three. 

All  the  probability  distributions  have  an  analogous  information 
function  defined  by  the  logarithmic  relationship 


W  =  -  In  f 


mentioned  previously.  Here  they  are:  The  information  function  of 
W(<p);  and  the  information  function  of  ®,  W(®).  The  conditional 
information  function  of  <p  -if-®,  W(<p  ®);  and  the  conditional  informa¬ 
tion  function  of  ®-if-o,  W(®|<p).  The  joint  information  function  of 
<p  and  ®,  W(o,  ®).  Again  we  mention  that  the  same  symbol,  W, 
stands  for  different  functions  distinguished  by  their  argument 
structure.  The  multiplicative  relationships  involving  conditional 
probability  become  additive  relationships  involving  conditional 
information,  as  we  can  see  by  the  following: 
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* 


W(</5  ,35)  =  W(</5  |  3))  +  W(35) 

=  W(35|p)  +  W(£)  , 

with  any  one  of  the  conditional  or  single  variable  information 
functions  being  expressible  in  terms  of  the  other  three.  These  are 
the  basic  mathematical  elements  to  be  used  in  the  construction  of 
the  optimization  function. 

One  small  obstacle  remains,  the  fact  that  all  these  information 
functions  involve  <5  and  3)  instead  of  <P  and  $  as  required  by  the 
unfolding  processor.  The  arguments  and  3)  are  reasonable 
starting  points  for  any  theoretical  investigation  of  the  information 
flow  process,  because  both  are  firmly  grounded  in  the  physical 
nature  of  the  source  in  question  and  of  the  spectrometer  and  therefore 
lead  to  physically  motivated  probability  distributions  and  information 
functions,  while  <p  is  a  synthetic  object,  a  construction  of  the 
unfolding  processor.  There  are  no  physically  motivated  probability 
distributions  or  information  functions  involving  </?,  except  those 
generated  by  fiat. 

Here  is  that  fiat.  If  the  distributions  involving  o  and  35  were 
known,  it  would  be  possible  to  construct  a  Monte  Carlo  unfolding 
processor  which,  using  the  known  distributions  of  o  and  35,  generated 
random  o  in  such  a  way  that  all  probability  distributions  or  informa¬ 
tion  functions  involving  and  35  were  the  same  as  those  involving 
o  and  3).  This  would  be  a  processor  different  from  the  optimizing 
processor,  because  its  output  would  be  random,  but  it  is  valuable 


f 
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conceptually  because  it  represents  a  probabilistic  inverse  of  the 
spectrometer  measurement  process,  the  image  distribution  for 
given  data,  f (<£>  |  $),  being  the  same  as  the  object  distribution, 
f(o  $)  for  tha^  data.  Therefore,  let  all  distributions  involving  <P 
be  generated  from  those  involving  <p  by  the  replacement  rule 

To  return  to  the  determination  of  the  optimum  image,  the  optimum 
image  is  that  image  function  which  corresponds  to  the  most  probable 
object  function  for  the  observed  data.  It  is  therefore  the  image 
function  that  minimizes  the  conditional  information  of  <p-if-S), 

W(<p  j  $).  This  is  a  straightforward  and  precise  statement  of  the 
principle  of  unfolding  which  combines  the  previously  mentioned 
principle  of  optimization  with  a  specific  form  of  the  W-criterion. 

The  construction  of  W(<p|  55)  by  direct  consideration  of  the 
conditional  probability  of  <p -if- 55  is  usually  impossible.  Theoretical 
models  generally  follow  the  true  direction  of  information  flow,  that 
is  not  from  $  to  <p,  but  from  to  5).  Remember  that  f(3)|  <p)  is  the 
distribution  and  therefore  W(5>  a)  the  information  function  which 
contains  a  (usually  linear)  mathematical  model  of  the  spectrometer 
response  as  well  as  of  the  statistical  fluctuations.  This  requires 
consideration  of  the  conditional  distribution  of  data  $  while  the 
object  function  is  held  fixed,  a  point  that  will  become  less  abstract 
when  we  consider  in  Section  4  a  specific  form  for  the  iriormation 
function.  To  do  this  we  use  the  relationships  among  the  conditional 
and  single  variable  information  functions  to  write 

W(<p  |  T)  =  W(25 1  <p)  +  W(<p)  -  W(E)  . 
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Since  the  minimization  of  W ( <p  1 3D )  occurs  by  varying  p,  the 
information  function  W($)  acts  like  a  constant  and  may  be 
dropped.  Then 

w  =  W(»  |  < p)  +  W(p) 
is  the  function  tnat  is  minimized. 

One  of  the  nicest  properties  of  t^e  information  function  W, 
consistent  with  intuition  and  easy  to  remember,  is  its  additivity. 

The  form  of  W,  a  sum  of  two  terms,  may  be  interpreted  as  meaning 
that  information  comes  from  two  sources.  The  function  W($  p)  is 
often  called  the  a  posteriori  information,  anc  represents  information 
that  became  available  upon  measurement.  As  p  varied  from  its 
initial  form  toward  the  minimum  of  W(®  p),  and  in  many  i-erational 
techniques  this  means  from  a  relatively  structureless  form  to  a 
sharply  varying  spectrum,  p  is  brought  closer  and  closer  toward 
consistency  with  data  Consistency  is  a  good  thing  to  a  degree, 
but  beyond  some  limit  P  ceases  to  acquire  meaningful  detail  from  $ 
and  instead  acquires  noise.  Since  the  spectrometer  introduces 
blurring  of  5)  relative  to  p,  and  since  W(®  P)  is  constructed  in 
such  a  way  that  P  is  consistent  with  ®  when  it  is  relatively  sharp 
with  respect  to  ®,  p  contains  high  spatial  frequency  components 
and  is  extremely  susceptible  to  the  acquisition  of  noise.  This  is 
where  W(p)  comes  in.  We  know  from  our  understanding  of  physics 
that  it  is  extremely  unlikely  for  noisy  random  fluctuations  and  typical 
source  spectra  to  look  alike.  There  is  a  certain  range  of  reasonable 
spectral  shapes.  This  is  information  represented  by  W(p), 
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important  information  such  as  the  tendency  for  peaks  to  assume  a 
Gaussian  vai  ition.  This  is  a  priori  information,  known  prior  to 
measurement.  The  total  information  is  the  sum  of  information  known 
before  anJ  indep  ndently  of  the  measurement  and  information  acquired 
upon  measurement. 

To  expand  on  the  point  raised  In  the  preceding  paragraph  con¬ 
cerning  consistency  with  the  data,  an  image  is  consistent  with  its 
associated  data  if  it  predicts  that  the  data  is  probable,  that  is,  if 
knowledge  of  the  camera  response  and  the  nature  of  the  statistical 
c  tunting  fluctuations  leads  from  an  object  like  the  proposed  image 
to  a  data  set  close  or  similar  to  the  observed  data  set.  Than  an 
understanding  of  the  spectrometer  response  and  the  distribution  of 
fluctuations  allows  one  to  assume  an  object  spectrum  and  compute 
the  probability  of  a  given  data  set  will  be  discussed  further  in  the 
next  section.  The  negative  logarithm  of  this  probability  is  the 
a  posteriori  information.  It  is  a  function  of  the  estimated  image  <p 
and  the  data  3) ,  but  also  of  the  size  of  the  data  fluctuations  and  of 
the  instrumental  response.  The  more  probable  the  observed  data, 
the  more  consistent  the  estimated  image,  and  the  smaller  the  value 
a  posteriori  information.  A  value  of  a  posteriori  information  can 
be  chosen  such  that  all  spectra  with  a  posteriori  information  smaller 
than  this  value  are  considered  consistent  and  all  images  with 
a  posteriori  information  greater  than  this  value  are  considered 
inconsistent.  This  defines  a  geometric  region  of  consistency  within 
the  space  of  possible  images. 
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This  is  not  quite  true,  due  to  a  geometrical  condition  present 
in  high-dimensional  spaces  such  as  the  hi gh -dimensional  data  space, 
but  with  no  analog  in  tho  three-dimensional  space  our  intuition  is 
based  upon.  In  a  hign-dimensional  spree  nearly  all  the  volume  of  a 
smoothly  bounded  region  is  concentrated  at  the  boundary  of  the 
region.  It  is  as  if  all  solid  objects  had  nothing  inside.  This  may 
sound  strange,  but  its  truth  is  easily  demonstrated  in  the  case  of  an 
n-dimensioral  sphere.  If  the  volume  of  an  n-dimensional  sphere  is 
obtained  hy  integrating  the  volume  of  a  series  of  infinitesimally  thin 
concentric  shells,  the  result  is 

V  =  a  [  r11-*  dr  , 

where  a  is  a  constant.  The  contribution  of  any  infinitesimal  shell, 
n- 1 

ar  dr,  rises  extremely  rapidly  with  radius  because  r  is  raised 
to  a  high  power.  For  a  100-dimensional  vector  data,  99  percent  of 
the  spherical  volume  is  concentrated  within  the  outermost  1  percent 
of  radius!  This  effect  is  sometimes  known  as  "the  curse  of  many 
dimensions".  Its  consequence  is  the  concentration  of  nearly  all 
data  sets  for  a  given  true  <P  at  the  boundary  of  the  region  of 
consistency. 

The  information  value  determining  this  boundary  can  be  deduced 
from  the  random  properties  of  the  a  posteriori  information  function. 

If  some  fixed,  known  <p  generates  an  ensemble  of  data  sets  by  an 
ensemble  of  separate  measurements,  all  of  the  generated  data  sets 
are  known  to  be  consistent  with  <p  by  the  very  nature  of  the  experiment, 
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but  some  will  appeav  to  be  inconsistent  with  a  large  a  posteriori 
information  value,  due  to  statistical  fluctuations  in  $.  After  all, 
the  a  posteriori  information,  being  a  function  of  the  random 
variable  ®,  is  itself  a  random  variable.  On  the  average,  some 
expectation  value  <W($jp)>  will  be  attained.  In  addition,  W(®|p) 
acts  as  a  r?dius,  defining  shells  in  S  -space  of  the  form 

Wj  s  <W(S>|  p)>  s  Wj  +  dWj  , 

so  that  as  a  consequence  of  the  curse  of  many  dimensions  it  is  a 
good  approximation  to  say 

W(T|o)  =  <W(£|  o)>  . 

Th"1  chance  of  this  equality  failing  to  a  given  accuracy  diminishes 
rapidly  with  increasing  dimensionality. 

In  the  light  of  this  nonintuitive  behavior  at  high  dimensionality, 
some  statements  concerning  consistency  with  the  data  should  be 
reviewed  and  revised.  In  the  optimization,  as  p  is  varied  from 
its  initial  form  toward  the  minimum  of  W($|p),  which  in  many 
iterational  techniques  means  from  a  relatively  structureless  form 
to  a  detailed  image,  p  is  probable  brought  closer  and  closer  toward 
consistency  with  the  measured  data  $  until 

W($  |  p )  =  <  W(S)  |p))  . 

[Since  <p  is  unknown,  <W(S)|  p)>  can  be  approximated  by  <W(S  <p)>. 
Anyway  it  is  usually  found  to  be  approximately  constant,  independent 
of  p.J  Beyond  this,  further  optimization  of  W(S  p)  is  fruitless, 


because  <£>  probably  becomes  less  consistent  with  $ .  As  a  result 
the  optimization  of  W($  o)  alone  would  end  with  <p  confined  to  a 
subspace  of  the  space  of  all  possible  <p  defined  by  the  expectation 
value  of  the  a  posteriori  information-  That  is,  <P  would  not  be 
uniquely  determined. 


A  random  choice  of  o  from  the  subspace  of  <p  -space  defined  by 
the  equation 


W(£|<p)  =  <W(®|<p)> 


would  be  most  unsatisfactory.  The  noise  magnification  resulting 
from  resolution  enhancement  would  obliterate  all  recognizable 
features.  Good  spectra  are  present  within  the  subspace,  but  they 
represent  an  extremely  small  fraction  of  the  total  region.  It  is  the 
function  of  a  priori  information  functions  to  locate  these  small 
regions.  The  combined  optimization  of 

W  =  W(®|  <p)  +  W(<p) 


results  in  the  minimization  of  W(«p)  within  the  region  of  consistency. 
If  W (cp)  is  constructed  to  minimize  for  the  desired  class  of  spectra, 
the  desired  selection  will  be  achieved. 


The  a  posteriori  information,  W(®  <p),  determines  the  region 
ol  consistent  spectra,  and  the  a  priori  information,  W(<p),  determines 
the  most  regular  spectrum  within  the  region  of  consistency.  As  a 
practical  matter,  it  is  relatively  easy  to  construct  a  form  of  W(<p) 
that  prejudices  toward  regular  images,  and  relatively  difficult  to 


23 


normalize  it  so  that  its  relative  strength  in  W  is  exactly  that 
required  to  pull  W(T  <P)  to  its  expectation  value.  Therefoie,  it  is 
more  direct  to  state  the  optimization  problem  as 


minimize  W(<p) 


subject  to  W($  |  <p)  =  <W(§|o)> 

On  the  other  hand  it  is  relatively  difficult  to  implement  a  constrained 
optimization  like  this,  and  relatively  easy  to  implement  a  simple  or 
unconstrained  optimization.  Two  methods  have  been  developed  for 
converting  the  constrained  optimization  to  an  unconstrained  one. 

In  the  first  method  the  function  that  is  minimized  has  the  form 

2  2 

W  =  i{w(S|<p)  -  <W(D|^)>}  +  e{w(<p)|  , 

where  e  is  a  smaller  number.  The  first  term  dominates  until  the 
a  posteriori  information  approaches  its  expectation,  after  which  the 
minimization  of  the  a  priori  information  dominates.  In  the  second 
method,  the  method  of  Lagrange  multipliers,  the  function  that  is 
minimized  has  the  form 


W  =  W(£  |  (p)  +  X  W(<p) 


where  X  is  the  Lagrange  multiplier.  Generally,  the  value  attained 
by  W($|<p)  increases  as  X  increases,  causing  W(p)  to  pull  the 
optimum  <p  away  from  the  value  that  w)uld  minimize  W(S  o). 
Therefore,  the  Lagrange  multiplier  can  be  adjusted  as  part  of  the 
optimization,  to  the  value  required  to  fulfill  the  expectation  value 
condition. 
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3.  OPTIMIZATION  CRITERIA 


The  optimization  function  used  in  MAZE  2  is  designed  to 
produce  regular,  positive  spectral  images,  with  minimal  alteration 
of  the  true  widths  of  Gaussian  peaks.  MAZE3  is  similar  in  that 
sense.  In  this  section  we  describe  the  specific  form  of  this  function. 

An  NC -dimensional  energy  grid  structure  is  assumed  in  which 
th3  width  of  the  J-th  energy  bin  is  AE(J),  for  J=l, .  . .  ,NC.  This 
structure  is  specified  by  an  (NC+l)-dimensional  bin-edge  vector, 

E(J),  J=l, . . .  ,  (NC+1).  The  quantities  that  are  estimated  are  the 
probabilities  of  occurrence  of  flux  in  the  J-th  energy  channel,  an 
NC-dimensional  vector,  Q(J).  This  vector  is  related  to  the  image 
spectrum,  (E),  of  the  previous  chapttr  by  the  equation 

E  (J+l) 

Q(J)  =  f  <p(E)  dE  .  (3.  i) 

E(J) 

The  probability,  Q(J)  is  an  integral  quantity,  roughly  proportional 
to  the  width  of  tne  J-th  energy  bin,  while  the  image  spectrum,  <p  (E), 
is  a  differential  quantity,  independent  of  the  energy  grid  structure. 

Actually,  Q(J)  is  a  2*NC  -dimensional  vector  where  the  indices 
J=l, . . .  ,NC  refer  to  the  continuum  (Qc)  and  indices  J=NC+1,  . . .  ,2*NC 
refer  to  the  discrete  (QD)  portion  of  the  spectrum.  The  subscripts 
C  and  D  will  be  explic  tly  indicated  wherever  needed,  otherwise  it 


will  be  assumed  that  Q  refers  to  both  the  continuum  and  the  discrete 
elements  of  the  probability  vector  of  dimension  2*NC.  Then  QD(J) 
is  actually  Q(NC+J). 

There  is  an  NR -dimensional  data  vector,  C(I),  for  II,...  ,NR. 

It  is  assumed  to  be  linearly  related  to  the  spectral  estimate  Q, 
by  the  response  matrix,  A,  in  the  folding  equation 

C(I)  +  6C(I)  =  X>(I,J)  *  (Q(J)  +  6Q(J))  .  (3.2) 

J 

The  symbol  6C  represents  statistical  fluctuations  of  the  data  vector. 
In  Section  3. 1,  in  which  the  basic  optimization  function  is  described, 

C  is  assumed  to  have  been  obtained  by  a  simple  counting  process 
so  that  6C  is  distributed  accoiding  to  the  multinomial  distribution. 
The  assumption  is  not  generally  valid,  but  it  is  generally  possible 
to  transform  the  problem  to  a  form  in  which  the  multinomial 
assumption  yields  good  solutions.  This  is  the  equivalent  count 
form,  which  is  discussed  in  Section  3.3.  Also  part  of  the  equivalent 
count  form  is  the  condition 

NC 

+  *  (3-3) 

J~  1 

representing  the  assumption  that  all  flux  of  interest  passes  through 
the  defined  energy  grid  structure.  The  problem  is  worked  under 
condition  (3.  3)  and  restored  upon  completion  to  the  normalization 
specified  by  condition  (3. 1).  This  is  discussed  in  mathematical  detail 
in  this  section.  The  symbol  6Q  represents  error  in  the  image 
spectrum  relative  to  the  object  spectrum. 
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The  W-criterion  is  dependent  on  the  arguments  (Q;C,  A,  E,NR,NC) 
and  has  the  form  discussed  in  Section  2.  A  restatement  of  the 
variable  information  function  in  terms  of  these  arguments,  has 
the  form 


W(Q;C,  A.  E,  NR,  NC)  =  W^QjC,  A,  NR,  NC) 

+  tcW2(Qc;E,NC) 

+  tdW2(Qd;E,NC) 

where  is  the  a  posteriori  and  W2  the  a  priori  infurmation.  These 
will  be  treated  in  detail  in  the  following  subsections. 

3.  1  THE  A  POSTERIORI  INFORMATION 

The  form  of  the  a  posteriori  information  was  derived  from 
multinomial  statistics.  According  to  multinomial  statistics  the 
probability  for  C  counts  distributed  in  NR  channels  is: 

NR  pmC(I) 

PROB  =  'COUNTS!!  £  . 


where 


The  construction  of  an  information  function  as  the  negative  logarithm 
of  a  probability  distribution  was  discussed  in  Section  2,  whereupon 
it  follows  that 

NR 

W1  =  -  £  C(I)  *  Ui  P(I)  -  K  (3.1.2) 

The  constant,  K,  which  does  not  affect  Wj  as  a  function  of  Q,  has 
the  form 

NR 

K  =  In  (COUNTS!)  -  £  In  C(I)  . 

1=1 

Using  Stirling's  approximation 
MX!)  =  X  (n  X  -  X 
it  can  be  brought  to  the  form 
NR 

K  =  £  C(D  *  \ln  (COUNTS)  -  -In  C(I)f 

1=1 

Substituting  this  back  into  in  Equation  3. 1.2  and  combining  log 
terms,  we  find 

NR 

Wj(Q;C ,  A,  NR,  NR)  =  -  £  C(I)  *  m  COUNTS^Pd) 

1=1 


The  form  of  the  multinomial  distribution  assumes  that  the 
NR -dimensional  vector  P  is  the  probability  for  all  possible  channels 
of  detection.  Otherwise  there  would  effectively  be  an  (NR+l)-th  data 
channel  for  events  not  received  by  the  first  NR  channels.  This  can 
be  stated  mathematically  as  the  condition 

NR 

1  =  £  P(I)  •  (3.1.3) 

1=1 

It  has  the  same  form  as  the  previously  mentioned  condition  (3.3), 
and  is  directly  related  to  it.  Combining  Equations  3.1.1  and  3.  1.  3, 
we  get 

NR  NR  NC 

1  =  £  p(i>  =  £  £  a(i,j)  *  {q  (j)  +  q  (j)[  . 

1=1  1=1  J=1  1  L  D  ' 

Then,  reversing  the  order  of  summation,  this  becomes 
NC  NR 

1  =  £  jQc(J)  +  Qn(j)}  £  A(I,  J)  .  (3. 1.4) 

In  Section  3.3  it  will  be  shown  that  in  the  equivalent  count  form  the 
summation  of  A  over  I,  a  summation  of  the  columns  of  the  response 
matrix,  gives  the  result 

NR 

£  A(I,J)  =  1  . 

1=1 

Therefore  Equation  3. 1.  4  reduces  to  condition  3.  3. 
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Condition  3.3  constitutes  a  one-dimensional  constraint  on 
the  domain  of  optimization,  leaving  and  with  (NC-1)  degrees 
of  freedom.  In  this  constraint  region  the  calculus  of  variations 
provides  an  expression  for  the  minimum  of  W^.  Taking  the  variation 
of  Wj  with  respect  to  P  gives 


6W,  =  £ 


NR  6W.  NR  p(T. 

£  bp®  6P(I)  =  ■  £  m 6P(I) 


(3.  1.5) 


The  variation  of  the  constraint  Equation  3.1.3  gives 


NR  NR 

«  Z  P(I)  =  Z  «P(I)  =  0  .  (3.1.6) 

1=1  1=1 


Using  the  method  of  Lagrange  multipliers,  the  minimum  is  described 
by 


NR 

6W  +  X  6  £  P(I)  =  0 

1=1 


where  the  Lagrange  multiplier  X  is  to  be  determined.  Substituting 
3.1.5  and  3.1.6  into  this  equation,  it  becomes 

£  {-f$+x|6p®  -°  • 

Because  the  6  P(I)  are  independent,  we  set  each  coefficient  to  zero 
independently,  and  find 
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(3. 1.7) 


P(I) 


y 


the  Lagrange  multiplier  being  evaluated  by  summing  over  I. 
Performing  this  summation, 


NR 


z 

1=1 


pa)  =  i 


C(I)  .  COUNTS 
=I=1  X  X 


y 


from  which  X  =  COUNTS,  and 


P(I) 


C(I) 

COUNTS 


(3.  1.8) 


This  agrees  with  the  notion  that  P(I)  is  the  probability  for  a  count 
in  channel  I.  Substituting  this  P(I)  back  into  W^,  the  minimum 
value  of  is  found  to  be  zero. 

A  form  of  in  which  constraint  3.1.3  is  implicit  is  useful 
for  the  accelerated  steepest  descent  technique  used  in  MAZE.  It 
may  be  derived  by  considering  an  (NR+l)-th  dummy  channel,  channel  0, 
with  P  (0)  =  1  so  that 

NR 

£  p(D  «  i  . 

i=i 


Then,  P  (I)  for  I  4  0  is  far  removed  from  the  effect  of  the  constraint 
condition  and  can  vary  freely.  The  minimum  of  with  respect  to 
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NR 

E  p(i) 


1=1 


is  controlled  by  the  choice  of  C(0).  To  preserve  the  magnitude  of 
P(I),  the  probability  of  a  count  in  chan-  el  I  can  be  defined  as  aP(I), 
where  a-0.  Omitting  the  constant  term  from  Equation  3.1.2, 

Wj  becomes 

NR 

wj  =  -  C(0)  *  in  P(0)  -  £  C(I)  *  in  aP(I)  . 

1=1 

The  count  total  becomes  C(0)  +  COUNTS,  and  constraint  3.1.3 
becomes 


NR 

P(0)  +  a  £  P(I)  =  1 
1=1 


(3.  1.9) 


According  to  Equation  3. 1. 7  obtained  by  calculus  of  variations,  the 
minimum  of  occurs  at  the  P  values 

p(0)  =  C(0) _ 

V  '  C(0)  +  COUNTS  ’ 

aPtll  =  C(I) 

P(I)  C(0)  +  COUNTS  * 

The  latter  expression  must  agree  with  the  original  minimum  3.1.8, 
so  C(0)  must  be  set  to 
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C(0)  =  —  *  COUNTS  . 

Using  this  expression  for  C(0)  and  the  constraint  condition, 

Equation  3.1.9,  the  a  posteriori  information  becomes 

1  (  NR  )  NR 

W1  =  -  ^  *  COUNTS  *  in  1-aJ  P(I)  -  E  C(I)  *  In  aP(I) 
1  (  1=1  )  1=1 

In  the  limit  a-0  the  first  -tn  term  is  well  approximated  by  a  first 
order  expansion 

(  NR  )  NR 

tn  1  -  a  E  P(I)  a  E  P(I)  • 

(  1=1  J  1=1 


Substituting  this  back  into  the  equation  for  W^,  replacing  Q!P(I)  by 
its  original  form,  P(t) ,  and  taking  the  limit  as  a-+d,  becomes 


NR  NR 

W.  =  COUNTS  *  E  ptf)  -  E  C(I)  *  ln  p(!) 

1=1  1=1 


This  is  the  implicit  constraint  form  of  Wj.  With  this  form  the 
minimum  of  occurs  at  the  equation  given  by  3.1.8  without 
Lagrange  multipliers. 

3.2  THE  A  PRIORI  INFORMATION 

The  a  priori  information  Wg  is  designed  to  suppress  oscillations 
but  not  Gaussian  peaks.  Since  oscillations  are  accompanied  by 
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lai\q;e  derivative  terms,  the  reduction  of  almost  any  low  order 
derivative  function  will  reduce  oscillations.  Gaussian  peaks  also 
contribute  certain  large  derivative  terms,  and  a  careless  choice  of 
Wg  will  cause  Gaussian  peaks  to  be  smeared.  Since  a  compromise 
is  affected  between  smoothing  and  conformity  to  the  data,  the 
practical  result  of  using  a  bad  Wg  is  oscillations  near  Gaussian 
peaks  appearing  as  subsidiary  peaks  and  possible  negative  fluxes,  and 
smearing  of  the  true  peak.  For  this  reason,  care  must  be  exercised 
to  construct  a  derivative  expression  that  leaves  Gaussian  peaks 
intact. 


The  a  priori  information  functions  have  the  form 


W,(Q_;E,NC)  =  i  £ 


NC  [dtn„c(E  )| 


W2(Qd;E,NC)  = 


NC  [d  tnfiJE,) 

i  £  - iLi. 

J=1  dE 


for  the  continuum  and  discrete  spectra  respectively.  The  derivative 
foims  are  approximately  independent  of  the  properties  of  the 
Gaussian  peaks.  Thus,  minimization  results  in  regularization 
without  alteration  of  the  peaks.  The  Ej  is  an  energy  representative 
of  the  J-th  energy  bin  and  the  derivatives  are  in  terms  of  channels. 

It  is  assumed  that  energy  is  linear  as  a  function  of  channel  which 
is  a  good  approximation  for  germanium  detectors. 


34 


J 


These  forms  may  be  written  approximately  as 


NC  (  NC  )2 

w,  •=  4  z  E  D  (J,K)  *  log  PHI(K) 
J=1(K=1  “ 


where  oc  is  to  be  replaced  by  C  or  D  for  the  continuum  or  the 
discrete  parts  respectively.  Also,  o  (E^)  has  been  replaced  by 
PHI(K).  and  are  matrices  as  will  be  shown  below. 


Another  form  of  this  equation  is: 


NC  NC 

w,  =  4  £  £  log  PHI(J)  *  D  D  (J  K)  *  log  PHI(K) 

1  J=1  K=1  “  “ 


t 


t 


where 


D  D  (J  K) 
a  as  ’  ' 


NC 

E 

L=1 


D  (L,  J) 
cr  ' 


Da(L,K) 


T 

D1  *  D 
a;  a 


and,  again,  a  is  to  be  replaced  by  C  or  D. 

The  matrix  is  independent  of  PHI  and  can  be  calculated 
once  and  for  all.  For  the  first  derivative  we  have  the  following: 


t 


f 
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which  turns  out  to  be  an  approximation  to  the  second  derivative. 
The  other  off-diagonal  elements  are  zero. 


Similarly,  for  the  second  derivative: 
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which  is  an  approximation  to  the  fourth  derivative.  Because  the  D 
matrices  are  highly  redundant,  they  need  not  be  stored  explicitly. 

3.3  TRANSFORMATION  TO  EQUIVALENT  COUNTS 

The  multinomial  distribution  has  the  desirable  property  of 
distributing  C(I)  among  positive  values 

0  §  C(I) 

in  accordance  with  assumptions  that  are  reasonable  for  a  simple 
event-counting  process.  The  standard  deviati  :>n  of  the  I-th  channel 
data  may  be  deduced  from  the  multinomial  distribution,  and  has 
the  form 


o(I)  =  /COUNTS  *  'p(l)  *(1.  -  P(I))' 

For  multichannel  data  P(I)  is  usually  sufficiently  smaller  than  1. , 
to  make  the  approximation 

o(I)  =  /COUNTS  *  P(I)' 
or,  what  is  the  same 


o(i)  =  /<c(ijT 


where  <C(I)>  denotes  the  expectation  of  C(I).  Therefore,  the 
fractional  error  in  C(I)  is 


9 
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or,  approximately, 

1 

sm 

Generally,  data  D(I)  is  inputed  with  error  sigma  (I)  such  that 
the  fractional  error 

SIGMA(I) 

m 


is  not  equal  to  the  multinomial  value 
1 

/d(!7 


However,  if  a  pseudocount  vector  C(I)  is  constructed  from  D (I)  and 
SIGMA (I)  according  to  the  prescription 


C(I)  = 


D(I) 

SIGMA(I) 


9 


then  the  pseudocount  vector  will  have  the  desired  fractional  error. 


Now  let  us  consider  the  ramifications  Ci  the  pseudocount 
definition.  Suppose  V  is  a  vector  that  folds  to  D: 


NC 

D(I)  =  £  A(I,  J)  *  V(J) 
J=1 


If  a  new  response  matrix  is  defined  by  the  transformation 


A(I,  J)  -A(I,J)  =  ^  A(I,J) 


(3.3.  1) 
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then  V  will  fold  to  C: 


NC 

C(I)  =  E  A(I,J)  *  V(J)  . 

J=1 

If  A  is  again  redefined  by  the  transformation 

A(I,  J)  ->A(I,J)  =  ^i.J)  (3.3.2) 

E  A(I,J) 

1=1 

and  V  by  the  inverse  transformation 

(nr  j 

V(J)  -> V(J)  =  E  A(I,  J)  V(J)  ,  (3.3.3) 

(1=1  ) 

then  the  new  A  will  have  the  property  that 
NR 

£  A(I,J)  =  1  . 

1=1 

Upon  input  to  the  program,  pseudocount  vector  C  is  calculated, 
and  A  is  converted  to  the  equivalent  count  form  defined  by  (3.3. 1) 
and  (3.3.2).  The  spectral  estimate  therefore  contains  the  vector 
multiplier  seen  on  the  right  side  of  (3.3.3).  Upon  conclusion  of  the 
calculation,  this  multiplier  is  divided  out  and  A  is  restored  to  its 
original  form. 
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t 
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4.  IMPLEMENTATION  OF  THE  OPTIMIZATION 

The  minimization  of  W  is  accomplished  by  an  accelerated 
steepest  descent  moving  in  the  space  of  tnQ.  To  understand  this, 
consider  first  the  simple  steepest  descent,  an  iterational  technique 
which  generates  a  sequence  of  points  Q(1\  . . .  ,Q^, ...  tha. 

approaches  the  minimum  of  W.  The  method  is  known  by  knowing 
the  starting  point,  Q^,  and  a  rule  for  the  transition 
The  starting  point  is  taken  to  be  that  Q  for  which  the  image 

spectrum  is  constant  or  flat.  This  choice  assures  that  any  structure 

/  \  J 

that  appears  in  Q  comes  only  from  the  form  of  W  and  not  from 
the  choice  of  Q^.  The  rule  for  the  transition  takes 

the  form 


Wam)(J)  -  tnQ^m+1)(J)  =  tnQ^m)(J)  ♦  ta  A(m)(J) 
or,  equivalently, 


QLm>(J)  -  QLm+1)(J)  =  ea 


A(m)(J) 


(4.1) 


where  a  has  the  same  interpretation  as  in  the  previous  section.  The 
direction  of  the  transition  is  the  direction  of  steepest  descent. 

The  gradient  of  W  with  respect  to  £nQa  (J)  is 


i)W 

b  tnQ(J) 
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The  gradient  of  in  the  implicit  constraint  form  is 


5WX 
nQ  (J) 


|QC(J) 


qd(j) 


NR  , 

*  >  C(I)  -  Aft,  J)  * 

1=1  I 


CO)  I 

ml 


(4.2) 


The  gradient  of  from  Section  3.2  is 


aW2 

9 1  nQJJ) 


NC 

Qa(J)  *V  DaDa(J,K)  *  inPHI(K), 


where 

SW2 

8inQa(J) 


bW2 

bin  PHI(J) 


because  Q  (J)  differs  from  PHI(J)  by  only  a  constant  multiplier. 

/  v* 

Since  A1  is  a  direction,  it  may  be  multiplied  by  any  constant, 
and  we  take 


oWfj)  = 


i  (  6Wi 
COUNTS  \  t  tn  Q(j) 


9  Wr 


+  T 


+  T 


c  S^nQc(J)  '  ’  D  9^nQD(J) 


This  choice  keeps  the  components  of  A^  closer  to  the  value  one. 


Expanding  W  in  a  second  order  Taylor  expansion  in  Q  (J) 

/  \  OL 

about  Qa  (J)  gives  a  quadratic  dependence  with  t  in  the 
direction  A^m^.  We  let  Q^m^(J)  become  Q^m+^(J)  by  the  process 
(4. 1)  and  the  deviation  in  Qa(J)  is 
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r 


? 


♦ 


Qa(m+1)(J)  -  Q -  Qa<“)(J)  .  {  e‘^<m)W)  .  x  J  . 

(m ) 

Since  (J)  is  assumed  to  be  small,  we  expand  the  exponential 

to  the  first  order,  and  find,  approximately, 

Qimtl)(J)  -  Q^fJ)  =  Q^)(J)  <  .  (4. 

Therefore,  the  second  order  Taylor  expansion  of  W  becomes 


w(Q<,m+1)(J))  =  w(Q<(m)(J)) 


(4.4) 

The  derivatives  are  evaluated  at  Qa(m)(J)  and  B  has  the  same 
meaning  as  a . 


The  minimum  of  approximation  4.  4  occurs  where 


i>W 

6t 

a 


=  0 


(4-5) 


Since  W  is  approximated  by  4.  4  as  quadratic  in  t  ,  4.  5  is  a  linear 

oe 

equation  in  tQ ,  which  can  be  solved  to  obtain  t  ,  the  appropriate 
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i 

I 
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step  length  for  the  (m)  —  (m+l)  iteration.  However,  approximation 
4.3  holds  only  if  t  A^fJ)  is  small  with  respect  to  the  value  1. 
Otherwise,  the  argument  leading  to  4.  5  breaks  down.  To  prevent 
this,  we  modify  4.  5  to  the  following: 


a 


NC 

[t  A(m)(J) 

w+bE 

Oi 

- 5 - 

I  J=1 

(4.6) 


,(m), 


The  additional  term  is  a  chi -squared  expression  with  taA^“'(J) 
having  a  mean  of  zero  and  a  standard  deviation  of  0.  5  about  zero. 
Therefore,  4.  6  contains  both  a  tendency  to  minimize  W  and,  in 
balance,  a  tendency  to  hold  approximation  4.  3  within  its  range  of 
validity. 


To  obtain  the  step  length,  t^,  we  combine  4.  4  and  4.  6.  Then, 


t 


D,C 

NC 


(  TyJP  9 

)  ^  A(m)/TX^(m)/TN  b  W  A(m)/T^  ^(m)/T^ 

j, k=i  Q“  A  (K)  %  (K) 


.-1 


NC 


+ 


g  l*(m)y>fj  *  !‘|  QDm)  A<m)<J>- 


NC 

£ 

j=i 


e,  Qcm)<J)  A(m)<J){ 


(4.7) 


where  the  first  term  in  brackets  is  a  2  x  2  matrix  and  the  second  term 
is  a  vector.  The  first  derivative  has  already  been  expressed  in 
Equation  4.2.  The  second  derivative  has  the  form 


(4.8) 


i>2W 


NR  pm  D  D  (J,K) 

=  £  p^y**2  A(I’ J)  A(I’K)  +  ra 

Ot  p 


where  a  and  0  take  on  values  C  and  D  in  turn.  The  two  bracketed 
terms  of  4.  7  are  obtained  from  the  derivative  expressions  by 
performing  the  summations  indicated  in  4.7,  a  straightforward 
algebraic  manipulation.  When  the  first  term  of  the  second  derivative 
4.  8  containing  an  I-summation  is  combined  with  the  J,  K-summation 
in  4.7,  a  triple  summation  results: 


NC 

E 

J,K=1 


NR 

l(m)(J)  Q^m)(J)  E  A(I,  J)  A(I,K)  A(m)(K)  Q^m)(K) 


This  would  be  a  time-consuming  operation,  but  fortunately  it  can  be 
reduced  to  a  double  summation 


with  a  consequent  saving  in  computation  time. 

The  preceding  description  of  the  iteration  step  is  based  upon 
the  calculation  of  a  direction,  according  to  the  gradient  of 

W,  and  of  a  path  length,  ta,  according  to  a  modified  quadratic 
approximation  of  W.  These  quantities  are  then  used  to  transform 
the  m-th  spectral  estimate  by  transition  rule  4. 1: 

<-nQ^m)(J)  ->  tnQ^m+1)(J)  =  inQ^J)  +  toA(m)(J)  . 


An  acceleration  process  is  applied  to  the  simple  gradient  sequence 
described  thus  far.  In  the  accelerated  steepest  descent, 
alternates  between  the  gradient  form  and  the  difference  form 

A(m)(J)  =  -tnQ(m)(J)  -  Wm"3)(J) 

in  the  sequence  gradient,  gradient,  gradient,  difference,  gradient, 
difference,  gradient,  difference  . . . ;  that  is,  gradient  form  on  odd 
steps,  difference  form  on  even  steps,  except  the  second,  which  is 
gradient.  This  speeds  the  calculation  significantly.  It  decouples 
the  gradient  of  W  from  direct  control  of  the  iterational  sequence, 
and,  in  effect,  allows  the  iterational  sequence  to  build  up  momentum. 

In  Section  2,  we  discussed  the  need  for  adjusting  the  Lagrange 
multiplier,  called  in  Section  3,  to  obtain  agreement  between  the 
value  of  the  a  posteriori  information  and  its  expectation  value. 

This  is  called  the  outer  iteration  of  as  distinguished  from  the 
inner  iteration  of  W  that  has  been  discussed  in  this  section  so  far. 

A  complete  linear  iterational  sequence  is  executed  at  each  stage 
of  the  outer  iteration  associated  with  a  given  value  of  r^. 

First,  let  us  consider  the  expectation  value  of  W^.  Since 
is  a  function  of  the  random  variable  C,  it  is  itself  a  random 
variable.  Expanding  Wj  in  C  about  the  minimum  point 

C(I)  =  COUNTS  *  P(I)  , 
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becomes 


NR  6  W, 


^(Cd)  +  6C(D)  -Wjtcd))  +  E  scd) 


NR  NR 

+  i  £ 

1=1  L=1 


b2W, 


6C(I)  6C(L) 


(4.9) 


At  the  minimum  point 
W^Cd))  =  0 


The  first  derivative  of  the  explicit  constraint  form  of  W  is 


C(I)=COUNTS*P(I) 


,  COUNTS*P(I)  , 
=  -  in  - — ±-L  +  1 


lcd)=- 


=  i  . 


COUNTS*P(I) 


(4. 10) 


The  second  derivative  of  the  explicit  constraint  form  is 


j  COUNI 


<I)=COUNTS* 


P(I)  ( 


,  I=L 


,  VL  .  (4.11) 


Substituting  4.  10  and  4. 11  in  4.  9,  becomes 
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NR  m  terms2 

wi  =  |j  6C(I)  +  COUFTS^PI!)  * 

The  randomness  of  arises  from  the  randomness  of  6C(I).  H  the 
expectation  of  C (1)  is  approximated  as  COUNTS*P(I),  the  expectation 
of  6 c (I)  is  zero.  The  expectation  of  {6C(I)|^  is  COUNTS*P(I),  so 
the  expectation  of  is 

(Wj )  =  i  NR  . 

The  condition 

W^(Q)  =  expectation  Wj 

can  be  taken  as  a  test  of  the  consistency  of  Q  with  the  data.  Several 
iterational  sequences  are  executed  until  has  a  value  such  that 

expectation  Wj  <W^  <1.3  *  expectation  Wj 

This  is  the  value  which  determines  the  adjustment  of  t  . 

This  adjustment  process  is  the  iterational  sequence  associated 
with  the  outer  iteration.  After  each  inner  iterational  sequence, 
is  adjusted  according  to  the  transition  rule 

<Wj> 
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getting  larger  if  a  small  indicates  regularization  is  too  weak, 
or  smaller  if  a  large  indicates  regularization  is  too  strong. 

If,  at  any  point  in  the  outer  sequence,  the  consistency  is  good 
enough  to  permit  the  condition 

<W1)<W1  <1.3  <WX> 

to  be  obeyed,  the  outer  iteration  is  terminated. 


5.  RESPONSE  FUNCTION  PARAMETERIZATION 


The  complete  Ge(Li)  spectrum  unfolding  turns  out  to  be 
mechanically  very  tedious  since  a  great  deal  of  laboratory  work 
has  to  be  done  in  order  to  obtain  the  spectrometer  response. 
Consequently,  the  potential  users  of  the  code  that  were  contacted 
expressed  minimal  interest  in  a  code  that  would  fully  unfold  Ge(Li) 
spectra.  However,  a  great  deal  of  interest  was  expressed 
regarding  a  code  that  would  superresolve  peaks,  since  the  detector 
measurements  necessary  are  concerned  only  with  line  width 
information. 

Another  consideration  along  these  lines  is  that  the  response 
matrix  for  the  unfolding  of  Gaussian  spectral  lines  is  "nearly" 
diagonal.  That  is,  the  non-zero  values  are  concentrated  near  the 
diagonal  elements  of  the  matrix.  This  means  that  a  spectrum  that 
has  many  channels  can  be  sectioned  into  smaller,  more  reasonable 
pieces  without  taxing  the  storage  capacity  of  most  serial  computers. 
Minimal  storage  capacity  was  not  a  prime  requirement  for  the  14 
system,  however  any  actual  checks  on  the  performance  of  the  code 
on  actual  data  were  of  necessity  limited  to  cases  for  which  detector 
response  data  could  be  readily  obtained. 

One  of  the  difficulties  in  Ge(Li)  unfolding  is  the  representation 
of  the  response  matrix.  If  the  unfolding  problem  has  many  channels 
then  the  matrix  is  very  large  and  cannot  be  core  contained.  This 
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means  that  only  parts  of  the  matrix  can  be  in  the  computer  memory 
and  the  rest  shuffled  in  and  out  as  needed.  This  problem  has  to  be 
dealt  with  in  any  case  but  for  Ge(Li)  detectors  there  is  a  way  to 
parameterize  the  matrix  in  terms  of  certain  parameters.  Figure  1 
is  an  example  of  a  2  MeV  Ge(Li)  line  shape  but  without  the  Gaussian 
blurring  on  the  peaks.  The  mathematical  details  of  the  parameteriza¬ 
tion  have  been  worked  out  but  will  not  be  presented  here  in  great 
detail,  only  conceptually. 

As  detailed  in  Chapters  1  and  2,  the  information  function  can 
be  represented  as  two  parts:  the  a  posteriori  and  a  priori  as  follows. 

W  =  W1[S,A.(^t,X,a,Eo)]+ W2(^t,X,a,Eo)  ,  1=1,. ..,5 

where  the  parameters  are  only  the  ones  important  in  the  parameteriza¬ 
tion.  The  arguments  are  as  follows: 

Ai<E0)  ■  Ai(V*-CT>E0>  ' 

where  A^(Eq)  is  the  response  specifying  the  contribution  to  channel  i 
from  energy  Eq.  The  0^'s  are  the  amplitude  weight  factors  for 
the  full  energy  peak,  single  escape  peak,  double  escape  peak, 

Compton  continuum,  and  Compton  edge  corresponding  to  l  =  1,2, 3, 4,  5 
respectively.  X  is  the  logarithmic  slope  of  the  Compton  edge  and 
cr  is  the  variance  of  the  Gaussian  distribution  of  the  peaks.  It  is 
assumed  that  the  variance  is  the  same  for  all  of  the  peaks.  ®  is 
the  response  data  at  energy  Eq. 
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Probability  of  an  event 


•25  .50  .75  1.00  1.25  1.50  1.75  2.00  2.25 

Energy  (MeV) 


Figure  1.  Ge(Li)  line  shape  at  2  MeV. 
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The  object  of  the  parameterization  is  to  minimize  W  with 
respect  to  A. ,  and  a  for  a  given  energy  Eq  and  response  A.  (Eq). 
The  minimization  is  an  iterative  process.  Let  y  be  a  parameter: 

y  A.  or  cr .  Then  the  Taylor  expansion  about  the  n-th  estimate 
of  y,  yn,  to  determine  the  (n+l)-th  estimate  is: 


W(yn+1)  =  W(yn)  +  4^ 

by 


/  n+1  n. 

(y  -  y  ) 


b2w 


by  n 

v=y 


,  n+1  n  ^ 

(y  -  y  ) 


To  minimize  W(yn  ),  set 

=<> 

*>y  \  i 


and  we  get 


bW(yn+1)  _  bW  b2W 

~  n+1  by  +  ~  2" 

by  n  by  n 

y=y  y=y 


,  n+1  n. 

(y  -  y  ) 


Therefore, 


n  bW/by 

~  y 2 — r 

b  W/by 
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Substituting  the  original  parameters: 


£n+1  =  j3n 


t)2W 


t>  w 


t)/3n  b  |3n 


6/3 


n 


where  the  term  in  parentheses  represents  a  matrix  whose  (£,m)-th 
element  is  (t>2W)/(t>£^ b£^),  and  (6/3 n)  represents  the  vector 
(6W)/(^),  and  represents  the  vector  /3^  coefficients. 


Also,  collecting  the  A,  and  cr  formulae: 

.  n+1  _  .  n  6  W/6  A  n 

A  ~  A  — a — - — JV 

b  W/b(An) 


n+1  n  bW/bo11 
a  =a - „ — L - - 

b2W/b(0n)2 

The  expressions  for  the  derivatives  of  W  are  fairly  complicated 
and  will  not  be  presented  here.  Starting  with  a  set:  /S  A0,  o° t 
each  one  is  iterated  in  turn  independently  until  W  is  minimized. 

This  is  accomplished  for  each  Eq  and  it  may  even  be  possible  to 
parameterize  the  resulting  parameters  as  a  function  of  energy. 
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6.  EXAMPLES 


Typically,  an  energy  distribution  of  gamma-rays  or  x-rays  is 
viewed  by  a  spectrometer  which  produces  a  data  set.  This  data 
vector  or  set  has  passed  through  a  complex  detection  process  and 
generally  suffers  from  loss  of  resolution  from  the  detection  process 
and  generally  suffers  from  loss  of  resolution  from  the  detection 
process  and  from  statistical  noise.  It  is  the  purpose  of  the  MAZE 
program  to  remove  degradation  imposed  by  the  detector,  to  minimize 
the  effects  of  statistical  fluctuations  and  to  construct  an  image 
spectrum  that  corresponds  as  closely  as  possible  to  the  spectrum 
impinging  on  the  detector  (the  object  spectrum). 

6.  1  EXAMPLE  1,  SIMULATED  DATA 

To  test  the  performance  of  the  MAZE  program,  it  is  desirable 

to  use  a  data  set  for  which  the  object  spectrum  and  the  detector 

response  are  precisely  known.  This  was  most  conveniently  done  by 

manufacturing  a  data  set.  An  object  spectrum  composed  of  three 

-Bx 

delta  functions  on  an  exponential  continuum,  Ae”  ,  was  acted  upon 
with  Gaussian  smearing  to  simulate  a  detector  response  function. 

The  full  width  at  half  maximum,  FWHM,  of  the  Gaussian  function 
was  kept  constant  across  the  spectrum.  Appropriate  statistics 
were  introduced  by  using  a  Monte  Carlo  type  code  to  accumulate 
750,  000  counts,  distributed  over  sixty  channels,  into  the  data 
spectrum. 
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The  resulting  data  spectrum,  Figure  2,  contains  a  single,  well 
resolved  peak,  two  moderately  overlapping  peaks  and  a  continuum. 
The  correct  parameters  of  the  peaks  and  the  exponential  continuum 
are  listed  in  Table  1. 


TABLE  1 


Peak 

No. 

Central 
Channel  No. 

Area 

(Counts) 

FWHM 

(Channels) 

1 

12.0 

161108+401 

5.7 

2 

36.0 

161108+401 

5.7 

3 

46.0 

120831+348 

5.7 

Exponential  Continuum  7533e"°*  013585x 

(x  in  channels) 

The  spectrum  was  submitted  to  MAZE  for  unfolding  and  the 
resulting  image  spectrum  is  shown  in  Figure  3.  The  image  spectrum 
closely  approximates  the  object  spectrum  used  to  generate  the  data 
set.  The  enhanced  peaks  approach  delta  functions  and  the  background 
is  an  exponential  with  only  minor  deviations.  Parameters  of  these 
features  are  given  in  Table  2.  The  statistical  fluctuations  presented 
in  the  data  set  have  been  substantially  reduced  in  the  image  spectrum. 


TABLE  2 


Peak 

No. 

Central 
Channel  No. 

Peak  Channel 
(Counts) 

Percent  of 
Area 

Three  Channels 
(Counts) 

Percent  of 
Area 

FWHM 

(Channels) 

1 

12.0 

139892 

87% 

158653 

98+% 

1.01 

2 

36.0 

144906 

90% 

160566 

99+% 

0.97 

3 

46.0 

103405 

86% 

118540 

98% 

1.03 

Exponential  Continuum  7584e‘0, 01404x 


The  application  of  the  detector  response  matrix  on  the  calculated 
image  spectrum  should,  if  the  image  spectrum  is  correct,  reproduce 
the  inputed  data  set  within  statistical  limits.  MAZE  performs  the 
calculations  and  obtains  these  statistical  deviations  channel  by 
channel  (Figure  4).  In  this  example  the  deviations  are  random  and 
indicate  that  the  calculated  image  spectrum  reproduces  the  data 
set  quite  well  across  the  entire  spectrum. 

An  attempt  is  made  by  MAZE  during  the  unfolding  process  to 
distinguish  between  the  discrete  and  continuum  portions  of  the 
spectrum  and  to  provide  separate  outputs  for  the  discrete  portion, 
Figure  5,  and  for  the  continuum  portion,  Figure  6.  The  separation 
is  only  partially  successful.  In  Figure  5  the  peaks  are  seen  riding 
on  top  of  a  continuum  that  amounts  to  about  10  percent  of  the 
exponential  background  and  in  Figure  6  the  continuum  is,  of  course, 
aoout  10  percent  low  but  it  also  has  bothersome  oscillations.  There 
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Channel  Number 


Figure  4.  Deviations. 


is  no  inherent  reason  that  these  shortcomings  cannot  be  eliminated 
and,  ir  fact,  a  more  recent  version  of  the  code  has  nearly  completely 
eliminated  these  effects. 

.. 

6.2  EXAMPLE  2,  SIMULATED  DATA 

A  second  spectrum  was  manufactured  to  provide  a  more  exacting 
test  of  the  MAZE  code  performance.  The  starting  point  was  an 
object  spectrum  composed  of  five  delta  functions  on  an  exponential 
continuum.  Detector  smearing  and  the  introduction  of  statistics 
were  accomplished  as  for  example  number  1  above. 

The  resulting  data  spectrum,  Figure  7,  contains  a  single  well 
resolved  peak,  four  highly  overlapping  peaks  and  a  continuum.  The 
parameters  of  the  peaks  and  the  exponential  making  up  of  the  data 
spectrum  are  listed  in  Table  3. 

i 

TABLE  3 


Peak 

No. 

Central 
Channel  No. 

Area 

(Counts) 

FWHM 

(Channels) 

1 

12.0 

152229+390 

5.  7 

2 

31.0 

15223+  123 

5.7 

3 

36.0 

152229+390 

5.7 

4 

41.0 

76114+276 

5.7 

5 

46.0 

114172+338 

5.7 

Exponential  Continuum  7022e"0, 013585x  (x 

in  channels) 
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The  spectrum  was  submitted  to  MAZE  for  unfolding  and  the 
resulting  spectrum  is  shown  in  Figure  8.  The  image  spectrum  shows 
considerable  enhancement,  the  multiplet,  while  not  resolved  into 
four  separate  and  distinct  delta  functions,  is  clearly  made  up  of 
four  peaks.  Parameters  of  the  spectral  feature  are  given  in  Table  4. 

TABLE  4 


Peak 

No. 

Central 
Channel  No. 

Estimated 
Peak  Area 

FWHM 

(Channels) 

Fractional  Error 
in  Peak  Area 

1 

12.0 

151974 

1.07 

-0. 17% 

2 

30.7 

16677 

5.61 

+9.  5% 

3 

36.0 

150138 

1.50 

-1.4% 

4 

41.0 

80304 

3.29 

+5.  5% 

5 

46.0 

112712 

1.41 

-1.3% 

This  version  of  MAZE  tends  to  concentrate  on  the  major  features 
of  the  spectrum  while  ignoring  to  a  large  extent  the  minor  features. 

A  second  undesirable  characteristic  is  the  tendency  to  add  structure 
to  the  spectrum  wherever  there  are  substantial  statistical  fluctuations. 
In  this  example  a  small  artificial  peak  was  added  centered  at  channel 
20.  Both  of  the  above  characteristics  have  been  eliminated  in  a 
recent  reformulation  of  MAZE  (see  Section  6.4). 

6.3  EXAMPLE  3,  Si  (Li)  X-RAY  DATA 

X-ray  fluorescence  data  is  an  example  of  data  which  is  very 
difficult  to  analyze  by  conventional  techniques.  The  energy  resolution 
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afforded  by  state-of-the-art  detectors  is  not  sufficient  to  always 
resolve  the  many  X-ray  lines  often  produced  in  the  irradiation  of 
materials.  An  example  of  such  data  is  shown  in  Figure  9.  In  this 
example  there  are  at  least  ten  Gaussian  shaped  peaks  superimposed 
on  a  Bremsstrahlung  continuum.  Because  of  the  overlapping  nature 
of  the  peaks,  the  shape  and  intensity  of  the  continuum  is  not  clear. 
However,  this  information  is  vital  in  determining  the  area  of  intensity 
of  each  of  the  discrete  lines. 

The  result  of  applying  the  spectral  enhancement  capabilities  of 
MAZE  are  shown  in  Figure  10.  Peaks  1  through  7  become  clearly 
resolved  lines  and  a  small  peak  is  indicated  just  above  peak  7.  In 
addition  to  confirming  the  suspicion  that  peak  9  was  a  doublet,  MAZE 
suggests  that  peak  8  is  also  complex.  A  reasonable  Bremsstrahlung 
continuum  (see  Figure  10)  can  be  deduced  from  the  image  spectrum 
for  use  in  determining  line  intensities.  The  effective  energy 
resolution  (FWHM)  of  the  prominent  peaks  in  the  image  spectrum 
is  of  the  order  of  41  eV  (1.46  channels)  as  compared  to  170  eV 
(6.  0  channels)  in  the  data  spectrum. 

6.4  EXAMPLE  4,  Ge(Li)  GAMMA  RAY  DATA 

The  statistics  in  the  previous  example  were  excellent,  each  of 
several  of  the  peaks  in  the  data  spectrum  contained  hundreds  of 
thousands  of  counts.  We  are  not  always  so  fortunate,  often  we  must 
analyze  data  having  poor  statistics.  An  example  of  such  a  spectrum 
is  shown  in  Figure  11.  This  data  was  taken  with  a  Ge(Li)  gamma-ray 
spectrometer  and  was  obtained  from  W.  L.  Imhof  of  the  Lockheed 
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Figure  10.  Enhanced  x-ray  fluorescence  spectrum. 
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Space  Sciences  Laboratory.  A  visual  examination  of  Figure  11 
suggests  that  peaks  1  through  8  are  real  lines  while  peaks  3A,  3A, 
and  8A  may  only  be  statistical  fluctuations. 

In  applying  MAZE  to  this  data  it  was  assumed,  as  it  was  in 
the  previous  examples,  that  the  detector  response  was  pure  Gaussian 
in  nature.  The  resulting  image  spectrum  is  shown  in  Figure  12. 

The  energy  resolution  was  enhanced  considerably  but  unfortunately 
the  code  also  enhanced  and  amplified  the  statistical  fluctuations 
thereby  introducing  many  spurious  peaks.  Therefore,  we  must 
conclude  that  this  version  of  MAZE  is  limited  to  applications 
involving  data  with  good  statistics. 

Previously  it  was  stated  that  a  recently  improved  version  of 
MAZE  has  been  developed  that  handled  statistical  fluctuations  in  a 
more  reasonable  manner.  The  improved  MAZE  code  was  applied  to 
the  identical  Figure  11  data  set.  The  resulting  image  spectrum  is 
shown  in  Figure  12.  Not  only  is  the  energy  resolution  substantially 
enhanced  but  the  statistical  noise  is  greatly  suppressed.  All  spurious 
peaks  are  eliminated.  The  only  questionable  peak  that  remains  in 
the  image  spectrum  is  5A. 
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Figure  12.  Enhanced  gamma-ray  spectrum. 
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7.  CONCLUSION 


The  program  for  Illiac  IV  has  been  written  in  Glypnir^  and 
arrangements  are  currently  being  made  to  run  MAZE  on  14.  This 
has  been  delayed  purposely  so  that  the  14  system  being  tried  is  as 
up-to-date  as  possible.  Consequently,  there  are  no  run-time 
comparisons  that  can  be  quoted;  however,  as  previously  stated,  serial 
computers  can  only  tackle  relatively  small  (150-200  channels) 
problems  because  of  memory  limitations.  The  memory  of  14  is  also 
limited,  nonetheless  because  of  the  improved  representation  of  the 
response  matrix  it  may  be  possible  to  run  a  problem  with  256 
channels  on  14  that  is  core  contained.  For  higher  number  of  channels 
the  matrix  will  be  sectioned  and  buffered  in  and  out  of  PE  memory. 

The  response  matrix  is  assumed  to  be  nearly  diagonal  (see 
Section  5)  and  only  the  first  64  non-zero  values  of  each  row  are 
kept.  This  allows  the  inclusion  of  structure  into  the  matrix  that 
is  not  strictly  Gaussian.  It  also  allows  change  of  resolution  as  a 
function  of  energy. 

There  is  an  index  associated  with  each  row  which  indicates  the 
column  number  of  the  first  non-zero  element.  Each  PE  contains  one 
element  of  the  row  and  thus  a  whole  row  is  available  for  computation. 
Experience  indicates  that  64  is  a  high  enough  number  of  non  -zero 
elements  to  keep  in  the  response  matrix  for  most  foreseeable 
applications.  That  number  was  obviously  selected  because  of  the 
nature  of  14. 
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